A New Approach for Handling Null Values in Web Log Using KNN and Tabu Search KNN
نویسندگان
چکیده
When the data mining procedures deals with the extraction of interesting knowledge from web logs is known as Web usage mining. The result of any mining is successful, only if the dataset under consideration is well preprocessed. One of the important preprocessing steps is handling of null/missing values. Handlings of null values have been a great bit of test for researcher. Various methods are available for estimation of null value such as k-means clustering algorithm, MARE algorithm and fuzzy logic approach. Although all these process are not always efficient. We propose an efficient approach for handling null values in web log. We are using a hybrid tabu search – k nearest neighbor classifier with multiple distance function. Tabu search – KNN classifier perform feature selection of K-NN rules. We are handling null values efficiently by using different distance function. It is called Ensemble of function. It gives different set of feature vector. Feature selection is useful for improving the classification accuracy of NN rule. We are using different distance metric with different set of feature, so it reduces the possibility that some error will common. Therefore, proposed method is better for handling null values. The proposed method is using hybrid classifier with different distance metrics and different feature vector. It is evaluated using our MANIT database. Results have indicated that a significant increase in the performance when compared with simple K-NN classifier.
منابع مشابه
Tabu Search with Delta Test for Time Series Prediction using OP-KNN
This paper presents a working combination of input selection strategy and a fast approximator for time series prediction. The input selection is performed using Tabu Search with the Delta Test. The approximation methodology is called Optimally-Pruned k -Nearest Neighbors (OP-KNN), which has been recently developed for fast and accurate regression and classification tasks. In this paper we demon...
متن کاملA Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملOptimizing Collaborative Filtering by Interpolating the Individual and Group Behaviors
Collaborative filtering has been very successful in both research and E-commence applications. One of the most popular collaborative filtering algorithms is the k-Nearest Neighbor (KNN) method, which finds k nearest neighbors for a given user to predict his interests. Previous research on KNN algorithm usually suffers from the data sparseness problem, because the quantity of items users voted i...
متن کامل